智能论文笔记

UFO: Unified Feature Optimization

Teng Xi , Yifan Sun , Deli Yu , Bi Li , Nan Peng , Gang Zhang , Xinyu Zhang , Zhigang Wang , Jinwen Chen , Jian Wang

分类：计算机视觉

2022-07-21

本文提出了一种新颖的统一特征优化（UFO）范式，用于训练和在现实世界和大规模场景下进行深层模型，这需要集合多个AI功能。不明飞行物的目标是通过对所有任务进行大规模预修。与众所周知的基础模型相比，UFO具有两个不同的重点，即相对较小的模型大小，没有适应性成本：1）UFO以多任务学习方式将广泛的任务挤入中等尺寸的统一模型中并在转移到下游任务时进一步修剪模型大小。 2）不明飞行物不强调转移到新任务。相反，它旨在使修剪模型专门用于一个或多个已经看到的任务。有了这两个特征，UFO为灵活的部署提供了极大的便利，同时保持了大规模预处理的好处。 UFO的一个关键优点是修剪过程不仅可以减少模型的大小和推理消耗，而且还提高了某些任务的准确性。具体而言，UFO考虑了多任务培训，并对统一模型产生了两倍的影响：一些密切相关的任务具有相互利益，而某些任务相互冲突。不明飞行物设法通过新颖的网络体系结构搜索（NAS）方法来减少冲突并保留相互利益。对各种深度表示学习任务（即面部识别，人重新识别，车辆重新识别和产品检索）的实验表明，从UFO中修剪的模型比单件任务训练的对应物更高，但却具有更高的准确性较小的型号大小，验证不明飞行物的概念。此外，UFO还支持发布170亿个参数计算机视觉（CV）基础模型，该模型是该行业中最大的CV模型。

translated by 谷歌翻译

TENET: Transformer Encoding Network for Effective Temporal Flow on Motion Prediction

Yuting Wang , Hangning Zhou , Zhigang Zhang , Chen Feng , Huadong Lin , Chaofei Gao , Yizhi Tang , Zhenting Zhao , Shiyu Zhang , Jie Guo

分类：计算机视觉 | 人工智能

2022-06-30

该技术报告提出了一种有效的自动驾驶运动预测方法。我们开发了一种基于变压器的方法，用于输入编码和轨迹预测。此外，我们提出了时间流动头来增强轨迹编码。最后，使用了有效的K均值集合方法。使用我们的变压器网络和集合方法，我们以1.90的最新Brier-Minfde得分赢得了Argoverse 2 Motion预测挑战的第一名。

translated by 谷歌翻译

JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding

Wayne Xin Zhao , Kun Zhou , Zheng Gong , Beichen Zhang , Yuanhang Zhou , Jing Sha , Zhigang Chen , Shijin Wang , Cong Liu , Ji-Rong Wen

分类：自然语言处理 | 人工智能

2022-06-13

本文旨在通过介绍第一个中国数学预训练的语言模型〜（PLM）来提高机器的数学智能，以有效理解和表示数学问题。与其他标准NLP任务不同，数学文本很难理解，因为它们在问题陈述中涉及数学术语，符号和公式。通常，它需要复杂的数学逻辑和背景知识来解决数学问题。考虑到数学文本的复杂性质，我们设计了一种新的课程预培训方法，用于改善由基本和高级课程组成的数学PLM的学习。特别是，我们首先根据位置偏见的掩盖策略执行令牌级预训练，然后设计基于逻辑的预训练任务，旨在分别恢复改组的句子和公式。最后，我们介绍了一项更加困难的预训练任务，该任务强制执行PLM以检测和纠正其生成的解决方案中的错误。我们对离线评估（包括九个与数学相关的任务）和在线$ A/B $测试进行了广泛的实验。实验结果证明了与许多竞争基线相比，我们的方法的有效性。我们的代码可在：\ textColor {blue} {\ url {https://github.com/rucaibox/jiuzhang}}}中获得。

translated by 谷歌翻译

Distilling Inter-Class Distance for Semantic Segmentation

Zhengbo Zhang , Chunluan Zhou , Zhigang Tu

分类：计算机视觉

2022-05-07

在语义分段中广泛采用知识蒸馏以降低计算成本。先前的知识蒸馏方法的语义分割方法的重点是像素的特征特征对齐和阶级内特征变化蒸馏，从特征空间，这对于语义分割很重要。为了解决此问题，我们提出了一种类间距离蒸馏（IDD）方法，以将特征空间中的类间距离从教师网络转移到学生网络。此外，语义分割是一项依赖位置的任务，因此我们利用位置信息蒸馏模块来帮助学生网络编码更多的位置信息。在三个受欢迎的数据集上进行了广泛的实验：CityScapes，Pascal VOC和ADE20K表明，我们的方法有助于提高语义细分模型的准确性并实现最先进的性能。例如。它在CityScapes数据集上的准确性将基准模型（“ PSPNET+RESNET18”）提高了7.50％。

translated by 谷歌翻译

SnapshotNet: Self-supervised Feature Learning for Point Cloud Data Segmentation Using Minimal Labeled Data

Xingye Li , Ling Zhang , Zhigang Zhu

分类：计算机视觉

2022-01-13

手动注释复杂的场景点云数据集昂贵且容易出错。为了减少对标记数据的依赖性，提出了一种名为Snapshotnet的新模型作为自我监督的特征学习方法，它直接用于复杂3D场景的未标记点云数据。 Snapshotnet Pipleine包括三个阶段。在快照捕获阶段，从点云场景中采样被定义为本地点的快照。快照可以是直接从真实场景捕获的本地3D扫描的视图，或者从大3D 3D点云数据集中的虚拟视图。也可以在不同的采样率或视野（FOV）的不同采样率或视野（FOV）中进行对快照进行，从而从场景中捕获比例信息。在特征学习阶段，提出了一种名为Multi-FoV对比度的新的预文本任务，以识别两个快照是否来自同一对象，而不是在同一FOV中或跨不同的FOV中。快照通过两个自我监督的学习步骤：对比学习步骤与零件和比例对比度，然后是快照聚类步骤以提取更高的级别语义特征。然后，通过首先培训在学习特征上的标准SVM分类器的培训中实现了弱监督的分割阶段，其中包含少量标记的快照。训练的SVM用于预测输入快照的标签，并使用投票过程将预测标签转换为整个场景的语义分割的点明智标签分配。实验是在语义3D数据集上进行的，结果表明，该方法能够从无任何标签的复杂场景数据的快照学习有效特征。此外，当与弱监管点云语义分割的SOA方法相比，该方法已经显示了优势。

translated by 谷歌翻译

Multimodal Representations Learning Based on Mutual Information Maximization and Minimization and Identity Embedding for Multimodal Sentiment Analysis

Jiahao Zheng , Sen Zhang , Xiaoping Wang , Zhigang Zeng

分类：机器学习 | 自然语言处理 | 计算机视觉

2022-01-10

多模式情绪分析（MSA）是一种基本复杂的研究问题，因为不同方式与人类情绪表达的模糊性之间的异质性差距。虽然已经成功地建造了MSA的多模式表示，但仍有两个挑战需要解决：1）需要构建更强大的多模式表示来弥合异质性间隙并应对复杂的多模式相互作用和2）必须在整个信息流中有效地建模上下文动态。在这项工作中，我们提出了一种基于相互信息最大化和最小化和身份嵌入（MMMIE）的多模式表示模型。我们将模态对之间的相互信息最大化以及输入数据和相应功能之间的相互信息最小化，以挖掘模态不变和任务相关信息。此外，提出了身份嵌入，以提示下游网络来感知语境信息。两个公共数据集的实验结果证明了所提出的模型的有效性。

translated by 谷歌翻译

Knowledge Enhanced Sports Game Summarization

Jiaan Wang , Zhixu Li , Tingyi Zhang , Duo Zheng , Jianfeng Qu , An Liu , Lei Zhao , Zhigang Chen

分类：自然语言处理 | 人工智能

2021-11-24

体育比赛摘要旨在从实时评论产生体育新闻。但是，现有数据集全部通过自动收集和清洁过程构建，导致大量噪音。此外，目前的作品忽视了现场评论和体育新闻之间的知识差距，这限制了体育比赛摘要的表现。在本文中，我们介绍了K-Sportssum，一个具有两个特征的新数据集：（1）K-Sportssum从大规模游戏中收集大量数据。它有7,854个评论新闻性对。为了提高质量，K-Sportssum采用手动清洁过程; （2）与现有数据集不同，为了缩小知识缺口，K-Sportssum进一步提供了一个大型知识语料库，其中包含523名运动队和14,724名体育运动者的信息。此外，我们还介绍了一个知识增强的摘要，它利用实时评论和知识来生成体育新闻。关于K-Sportssum和Sportssum数据集的广泛实验表明，我们的模型实现了新的最先进的表演。定性分析和人类研究进一步验证我们的模型产生更具信息丰富的体育新闻。

translated by 谷歌翻译

Federated Multi-Agent Deep Reinforcement Learning Approach via Physics-Informed Reward for Multi-Microgrid Energy Management

Yuanzheng Li , Shangyang He , Yang Li , Yang Shi , Zhigang Zeng

分类：机器学习

2022-12-29

The utilization of large-scale distributed renewable energy promotes the development of the multi-microgrid (MMG), which raises the need of developing an effective energy management method to minimize economic costs and keep self energy-sufficiency. The multi-agent deep reinforcement learning (MADRL) has been widely used for the energy management problem because of its real-time scheduling ability. However, its training requires massive energy operation data of microgrids (MGs), while gathering these data from different MGs would threaten their privacy and data security. Therefore, this paper tackles this practical yet challenging issue by proposing a federated multi-agent deep reinforcement learning (F-MADRL) algorithm via the physics-informed reward. In this algorithm, the federated learning (FL) mechanism is introduced to train the F-MADRL algorithm thus ensures the privacy and the security of data. In addition, a decentralized MMG model is built, and the energy of each participated MG is managed by an agent, which aims to minimize economic costs and keep self energy-sufficiency according to the physics-informed reward. At first, MGs individually execute the self-training based on local energy operation data to train their local agent models. Then, these local models are periodically uploaded to a server and their parameters are aggregated to build a global agent, which will be broadcasted to MGs and replace their local agents. In this way, the experience of each MG agent can be shared and the energy operation data is not explicitly transmitted, thus protecting the privacy and ensuring data security. Finally, experiments are conducted on Oak Ridge national laboratory distributed energy control communication lab microgrid (ORNL-MG) test system, and the comparisons are carried out to verify the effectiveness of introducing the FL mechanism and the outperformance of our proposed F-MADRL.

translated by 谷歌翻译

InferEM: Inferring the Speaker's Intention for Empathetic Dialogue Generation

Guoqing Lv , Xiaoping Wang , Jiang Li , Zhigang Zeng

分类：自然语言处理

2022-12-13

Current approaches to empathetic response generation typically encode the entire dialogue history directly and put the output into a decoder to generate friendly feedback. These methods focus on modelling contextual information but neglect capturing the direct intention of the speaker. We argue that the last utterance in the dialogue empirically conveys the intention of the speaker. Consequently, we propose a novel model named InferEM for empathetic response generation. We separately encode the last utterance and fuse it with the entire dialogue through multi-head attention based intention fusion module to capture the speaker's intention. Besides, we utilize previous utterances to predict the last utterance, which simulates human's psychology to guess what the interlocutor may speak in advance. To balance the optimizing rates of the utterance prediction and response generation, a multi-task learning strategy is designed for InferEM. Experimental results demonstrate the plausibility and validity of InferEM in improving empathetic expression.

translated by 谷歌翻译

WIDER & CLOSER: Mixture of Short-channel Distillers for Zero-shot Cross-lingual Named Entity Recognition

Jun-Yu Ma , Beiduo Chen , Jia-Chen Gu , Zhen-Hua Ling , Wu Guo , Quan Liu , Zhigang Chen , Cong Liu

分类：自然语言处理

2022-12-07

Zero-shot cross-lingual named entity recognition (NER) aims at transferring knowledge from annotated and rich-resource data in source languages to unlabeled and lean-resource data in target languages. Existing mainstream methods based on the teacher-student distillation framework ignore the rich and complementary information lying in the intermediate layers of pre-trained language models, and domain-invariant information is easily lost during transfer. In this study, a mixture of short-channel distillers (MSD) method is proposed to fully interact the rich hierarchical information in the teacher model and to transfer knowledge to the student model sufficiently and efficiently. Concretely, a multi-channel distillation framework is designed for sufficient information transfer by aggregating multiple distillers as a mixture. Besides, an unsupervised method adopting parallel domain adaptation is proposed to shorten the channels between the teacher and student models to preserve domain-invariant features. Experiments on four datasets across nine languages demonstrate that the proposed method achieves new state-of-the-art performance on zero-shot cross-lingual NER and shows great generalization and compatibility across languages and fields.

translated by 谷歌翻译